智能论文笔记

IDLat: An Importance-Driven Latent Generation Method for Scientific Data

Jingyi Shen , Haoyu Li , Jiayi Xu , Ayan Biswas , Han-Wei Shen

分类：机器学习

2022-08-05

基于深度学习的潜在表示已被广泛用于众多科学可视化应用，例如等法相似性分析，音量渲染，流场合成和数据减少，仅举几例。但是，现有的潜在表示主要以无监督的方式从原始数据生成，这使得很难合并域兴趣以控制潜在表示的大小和重建数据的质量。在本文中，我们提出了一种新颖的重要性驱动的潜在表示，以促进领域利益引导的科学数据可视化和分析。我们利用空间重要性图来代表各种科学利益，并将它们作为特征转化网络的输入来指导潜在的生成。我们通过与自动编码器一起训练的无损熵编码算法，进一步降低了潜在尺寸，从而提高了存储和存储效率。我们通过多个科学可视化应用程序的数据进行定性和定量评估我们方法产生的潜图的有效性和效率。

translated by 谷歌翻译

DocSegTr: An Instance-Level End-to-End Document Image Segmentation Transformer

Sanket Biswas , Ayan Banerjee , Josep Lladós , Umapada Pal

分类：计算机视觉

2022-01-27

了解具有丰富布局的文档是迈向信息提取的重要一步。商业智能过程通常需要大规模从文档中提取有用的语义内容，以进行后续决策任务。在这种情况下，不同文档对象（标题，部分，图形等）的实例级分割已成为文档分析和理解社区的有趣问题。为了朝这个方向推进研究，我们提出了一个基于变压器的模型，称为\ emph {docsegtr}，用于文档图像中复杂布局的端到端实例分割。该方法适应了一个双重注意模块，用于语义推理，这有助于与最先进相比，有助于高度计算效率。据我们所知，这是基于变压器的文档细分的第一部作品。对竞争性基准等广泛的实验，例如PublayNet，Prima，“历史日语”和Tablebank，表明我们的模型比现有的最先进的方法具有可比较或更好的细分性能，平均精度为89.4、40.4、40.3、83.4和93.33 。这个简单而灵活的框架可以作为文档图像中实例级识别任务的有前途的基线。

translated by 谷歌翻译

The Ties that matter: From the perspective of Similarity Measure in Online Social Networks

Soumita Das , Anupam Biswas

分类：机器学习

2022-12-21

Online Social Networks have embarked on the importance of connection strength measures which has a broad array of applications such as, analyzing diffusion behaviors, community detection, link predictions, recommender systems. Though there are some existing connection strength measures, the density that a connection shares with it's neighbors and the directionality aspect has not received much attention. In this paper, we have proposed an asymmetric edge similarity measure namely, Neighborhood Density-based Edge Similarity (NDES) which provides a fundamental support to derive the strength of connection. The time complexity of NDES is $O(nk^2)$. An application of NDES for community detection in social network is shown. We have considered a similarity based community detection technique and substituted its similarity measure with NDES. The performance of NDES is evaluated on several small real-world datasets in terms of the effectiveness in detecting communities and compared with three widely used similarity measures. Empirical results show NDES enables detecting comparatively better communities both in terms of accuracy and quality.

translated by 谷歌翻译

DCC: A Cascade based Approach to Detect Communities in Social Networks

Soumita Das , Anupam Biswas , Akrati Saxena

分类：计算机视觉 | 机器学习

2022-12-21

Community detection in Social Networks is associated with finding and grouping the most similar nodes inherent in the network. These similar nodes are identified by computing tie strength. Stronger ties indicates higher proximity shared by connected node pairs. This work is motivated by Granovetter's argument that suggests that strong ties lies within densely connected nodes and the theory that community cores in real-world networks are densely connected. In this paper, we have introduced a novel method called \emph{Disjoint Community detection using Cascades (DCC)} which demonstrates the effectiveness of a new local density based tie strength measure on detecting communities. Here, tie strength is utilized to decide the paths followed for propagating information. The idea is to crawl through the tuple information of cascades towards the community core guided by increasing tie strength. Considering the cascade generation step, a novel preferential membership method has been developed to assign community labels to unassigned nodes. The efficacy of $DCC$ has been analyzed based on quality and accuracy on several real-world datasets and baseline community detection algorithms.

translated by 谷歌翻译

Beyond Information Exchange: An Approach to Deploy Network Properties for Information Diffusion

Soumita Das , Anupam Biswas , Ravi Kishore Devarapalli

分类：计算机视觉

2022-12-21

Information diffusion in Online Social Networks is a new and crucial problem in social network analysis field and requires significant research attention. Efficient diffusion of information are of critical importance in diverse situations such as; pandemic prevention, advertising, marketing etc. Although several mathematical models have been developed till date, but previous works lacked systematic analysis and exploration of the influence of neighborhood for information diffusion. In this paper, we have proposed Common Neighborhood Strategy (CNS) algorithm for information diffusion that demonstrates the role of common neighborhood in information propagation throughout the network. The performance of CNS algorithm is evaluated on several real-world datasets in terms of diffusion speed and diffusion outspread and compared with several widely used information diffusion models. Empirical results show CNS algorithm enables better information diffusion both in terms of diffusion speed and diffusion outspread.

translated by 谷歌翻译

Direct Comparative Analysis of Nature-inspired Optimization Algorithms on Community Detection Problem in Social Networks

Soumita Das , Bijita Singha , Alberto Tonda , Anupam Biswas

分类：计算机视觉 | 神经与进化计算

2022-12-21

Nature-inspired optimization Algorithms (NIOAs) are nowadays a popular choice for community detection in social networks. Community detection problem in social network is treated as optimization problem, where the objective is to either maximize the connection within the community or minimize connections between the communities. To apply NIOAs, either of the two, or both objectives are explored. Since NIOAs mostly exploit randomness in their strategies, it is necessary to analyze their performance for specific applications. In this paper, NIOAs are analyzed on the community detection problem. A direct comparison approach is followed to perform pairwise comparison of NIOAs. The performance is measured in terms of five scores designed based on prasatul matrix and also with average isolability. Three widely used real-world social networks and four NIOAs are considered for analyzing the quality of communities generated by NIOAs.

translated by 谷歌翻译

Forecasting formation of a Tropical Cyclone Using Reanalysis Data

Sandeep Kumar , Koushik Biswas , Ashish Kumar Pandey

分类：人工智能 | 计算机视觉 | 机器学习

2022-12-10

The tropical cyclone formation process is one of the most complex natural phenomena which is governed by various atmospheric, oceanographic, and geographic factors that varies with time and space. Despite several years of research, accurately predicting tropical cyclone formation remains a challenging task. While the existing numerical models have inherent limitations, the machine learning models fail to capture the spatial and temporal dimensions of the causal factors behind TC formation. In this study, a deep learning model has been proposed that can forecast the formation of a tropical cyclone with a lead time of up to 60 hours with high accuracy. The model uses the high-resolution reanalysis data ERA5 (ECMWF reanalysis 5th generation), and best track data IBTrACS (International Best Track Archive for Climate Stewardship) to forecast tropical cyclone formation in six ocean basins of the world. For 60 hours lead time the models achieve an accuracy in the range of 86.9% - 92.9% across the six ocean basins. The model takes about 5-15 minutes of training time depending on the ocean basin, and the amount of data used and can predict within seconds, thereby making it suitable for real-life usage.

translated by 谷歌翻译

Fairify: Fairness Verification of Neural Networks

Sumon Biswas , Hridesh Rajan

分类：机器学习 | 人工智能

2022-12-08

Fairness of machine learning (ML) software has become a major concern in the recent past. Although recent research on testing and improving fairness have demonstrated impact on real-world software, providing fairness guarantee in practice is still lacking. Certification of ML models is challenging because of the complex decision-making process of the models. In this paper, we proposed Fairify, an SMT-based approach to verify individual fairness property in neural network (NN) models. Individual fairness ensures that any two similar individuals get similar treatment irrespective of their protected attributes e.g., race, sex, age. Verifying this fairness property is hard because of the global checking and non-linear computation nodes in NN. We proposed sound approach to make individual fairness verification tractable for the developers. The key idea is that many neurons in the NN always remain inactive when a smaller part of the input domain is considered. So, Fairify leverages whitebox access to the models in production and then apply formal analysis based pruning. Our approach adopts input partitioning and then prunes the NN for each partition to provide fairness certification or counterexample. We leveraged interval arithmetic and activation heuristic of the neurons to perform the pruning as necessary. We evaluated Fairify on 25 real-world neural networks collected from four different sources, and demonstrated the effectiveness, scalability and performance over baseline and closely related work. Fairify is also configurable based on the domain and size of the NN. Our novel formulation of the problem can answer targeted verification queries with relaxations and counterexamples, which have practical implications.

translated by 谷歌翻译

Towards Understanding Fairness and its Composition in Ensemble Machine Learning

Usman Gohar , Sumon Biswas , Hridesh Rajan

分类：机器学习

2022-12-08

Machine Learning (ML) software has been widely adopted in modern society, with reported fairness implications for minority groups based on race, sex, age, etc. Many recent works have proposed methods to measure and mitigate algorithmic bias in ML models. The existing approaches focus on single classifier-based ML models. However, real-world ML models are often composed of multiple independent or dependent learners in an ensemble (e.g., Random Forest), where the fairness composes in a non-trivial way. How does fairness compose in ensembles? What are the fairness impacts of the learners on the ultimate fairness of the ensemble? Can fair learners result in an unfair ensemble? Furthermore, studies have shown that hyperparameters influence the fairness of ML models. Ensemble hyperparameters are more complex since they affect how learners are combined in different categories of ensembles. Understanding the impact of ensemble hyperparameters on fairness will help programmers design fair ensembles. Today, we do not understand these fully for different ensemble algorithms. In this paper, we comprehensively study popular real-world ensembles: bagging, boosting, stacking and voting. We have developed a benchmark of 168 ensemble models collected from Kaggle on four popular fairness datasets. We use existing fairness metrics to understand the composition of fairness. Our results show that ensembles can be designed to be fairer without using mitigation techniques. We also identify the interplay between fairness composition and data characteristics to guide fair ensemble design. Finally, our benchmark can be leveraged for further research on fair ensembles. To the best of our knowledge, this is one of the first and largest studies on fairness composition in ensembles yet presented in the literature.

translated by 谷歌翻译

Improved Deep Neural Network Generalization Using m-Sharpness-Aware Minimization

Kayhan Behdin , Qingquan Song , Aman Gupta , David Durfee , Ayan Acharya , Sathiya Keerthi , Rahul Mazumder

分类：机器学习

2022-12-07

Modern deep learning models are over-parameterized, where the optimization setup strongly affects the generalization performance. A key element of reliable optimization for these systems is the modification of the loss function. Sharpness-Aware Minimization (SAM) modifies the underlying loss function to guide descent methods towards flatter minima, which arguably have better generalization abilities. In this paper, we focus on a variant of SAM known as mSAM, which, during training, averages the updates generated by adversarial perturbations across several disjoint shards of a mini-batch. Recent work suggests that mSAM can outperform SAM in terms of test accuracy. However, a comprehensive empirical study of mSAM is missing from the literature -- previous results have mostly been limited to specific architectures and datasets. To that end, this paper presents a thorough empirical evaluation of mSAM on various tasks and datasets. We provide a flexible implementation of mSAM and compare the generalization performance of mSAM to the performance of SAM and vanilla training on different image classification and natural language processing tasks. We also conduct careful experiments to understand the computational cost of training with mSAM, its sensitivity to hyperparameters and its correlation with the flatness of the loss landscape. Our analysis reveals that mSAM yields superior generalization performance and flatter minima, compared to SAM, across a wide range of tasks without significantly increasing computational costs.

translated by 谷歌翻译